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ABSTRACT 

We present a new algorithm to rapidly and optimally compute power spectra. This new 
algorithm is based on a generalization of iterative multigrid, and has computational 
cost ©(-/V log TV), compared to the standard brute force approach which costs 0{N^). 
The procedure retains this speed on the full sky and for ill-conditioned matrices. It 
is applicable to galaxy power spectra, CMB, polarization and weak lensing data. We 
present a mathematical convergence analysis, and performance results. 
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1 INTRODUCTION 

Cosmological data sets are rapidly becoming very large, making the optimal analysis of data a challenging problem. Particularly 
large problems arise in the analysis of CMB data, galaxy power spectra and weak lensing data. For Gaussian random fields, the 
statistics are fully solvable, and the construction of optimal quadratic or maximum likelihood es timators is uniquely d efined. 

A full maximum likelihood analysis of COBE which contained 3890 pixels was done bv iBunn fc Whita JiQOtII . The 
optimal analysis procedure requires 0{N^) operations for N data points, which is generally very expensive, and prohibitive 
for many data sets . Data com pression is possible through the use of a Karhunen-Loeve expansion in terms of "signal-to- 
noise" eigenmodes llBondlll995l) . The first compression step still costs 0{N^), but subsequent analyses are accelerated. Faster 
procedures have been suggested, usually at the expense of optimality. The co mputation of correlation functions, f or example, 
only costs 0{N log N) operations, and has been a popular quick algorithm JPen et al.ll200i ISzapudi et al.ll20oJl. Similarly, 
a weighted Fourier Transform of the data is equally fast, and was proposed bv lFeLtoianer^dT ' i^9^) ~ ^en et alJ ll2003l) 
summarized the trade-off made in these approximations. 

We consider a power spectrum estimator optimal if it minimizes the variance. The variance of the power is a four point 
function, which can be expressed in terms of two point functions fo r a Gaussian random field. One can show that a naive 
quadratic estimator applied to the Wiener filtered data set is optimal (|Seliaklll99Sll . If the noise and signal covariance matrices 
commute, the correlation function and weighted Fourier techniques are optimal. If the signal-to-noise is less than unity on all 
scales, the correlation function is optimal. But in general none of these cases applies, in which case the historic fast estimators 
are suboptimal. 

In this paper we present an algorithm which is fast and optimal, and wor ks on a wide range of proble ms, geometries, and 
condition numbers. It is an extension of the iterative technique presented in IPadmanabhan et alJ (|2002l ) to cover regimes of 
poor condition number. 

The basic premise is that the forward operation of applying the signal and noise correlations to the data set an be done 
in 0(A''logA'^) time. We then construct iterative techniques to achieve the inverse. 



2 MAXIMUM LIKELIHOOD 

In this section we review the mathematical formulation of the power spectrum estimation problem. An observation is a data 
set of points sampled at positions Xi, written as a vector A^. We take this vector to have zero expectation value. This 
vector is assumed to be the sum of signal plus noise, such that the expectation of the covariance matrix is 
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{A,A,) = C,^ + Cf, (1) 

for a noise covariance matrix and a signal covariance C'^. We parametrize the signal covariance matrix as a sum of Ub 
band powers, 

C^ = ^PcCc. (2) 

Q = l 

The log of the probability of the data given a model is 



£ = log |C™ + C-^ I - " '^^ ' " + ^ log(27r) (3) 

We wish to find the set of parameters pa which maximize the likelihood ^ . Differentiating, one obtains a system of equations 
- TrKC"^ + C^] + A'(C'^ + C^)-'Ca(C'^ + C^)-'A = 0. (4) 

The Newton-Raphson solution for @ starts with an initial guess p% and iterates 

P/3 =P0-UG^is (5) 
where fa is the left hand side of Q and G is the Jacobian 

Gap = 2Fai3 - A*(C'^ + C^)-'C<,(C'^ + C^)"'C^(C'^ + C^y^A - A*(C'^ + C'')"'C0(C'^ + C^)"'C„(C'^ + C'')"' A.(6) 
F is the Fisher matrix defined as 

Fap = iTr[(C^ + C^)-iC^(C^ + C^)-^C„]. (7) 

Newton-Raphson only convergences in the domain where the sign of the curvature does not change. But even for one 
degree of freedom, the Gaussian likelihood encounters an inflection point. Consider 

21og/: = -log(27r)-21og(a)-^. (8) 

We solve the equation 
1 A^ 

/(.) = --^=0. (9) 

The solution is — A^, as one would expect. In one dimension the Jacobian is the derivative, /'(cr) = — l/cr^ + SA^/cr*, 
which changes sign when = 3A^. So the Newton-Raphson iteration (O moves towards the correct root for guesses within 
a factor of 3 the true solution, but moves towards infinity for guesses that are too large. 

An alternate representation is in terms of weighted quadratic estimators, where one first weights data by W and defines 
raw estimators as 



A'W'C^WA - Tr[WC„WC]v]. (10) 



The estimators of power Pa ~ F^^qfj. A Wiener filter choice is W = (C^ -f C^)~^. If interpreted in terms of the maximum 
likelihood iteration it amounts to an initial guess pjj — 0, with residual = qa- We used the expectation value (G) — F. 
This converges after a single iteration. Of course, we had assumed that we already knew the optimal weights for the Wiener 
filter. The quadratic estimator IIUII does not have convergence limitations. One can start with an approximate guess for the 
covariance C'^, and iterate. 

In both the maximum likelihood and the quadratic estimation processes, a number of expensive operations are required. 
The most common one is the solution to the Wiener filter, (C^ -I- C^)"^ A. We will discuss in section|21how to achieve using 
a fast algorithm, after which we only need to address the trace evaluation, which we do in section |S| 



3 ITERATIVE SOLUTION 

For stationary processes, the correlation matrices depend only on the separation distance between points. This makes them 
diagonal in Fourier space. Noise matrices are often diagonal in real space, or in a time stream space. For now we assume 
that a fast 0{NlogN) method exists to evaluate the forward operation. In sectionQwe present several fast procedures which 
imlement this. 

For concreteness, we assume that the signal correlation is a diagonal matrix in Fourier space, while the noise matrix is 
diagonal in real space. We concentrate on the problem 



{S + N)x^y. 



(11) 
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The first requirement is that (IIH can be evaluated quickly. This is demonstrated for a range of problems in section 2] We 
will express all vectors in real space. We can evaluate y in 0{Nlog{N)) operations as y = Na; + Sx. The underrelaxed Jacobi 
iterative solution is 

^n+i ^ 2;" + {SI + N)-i [y - (S + N)x"] . (12) 

S is our relaxation parameter, to be determined below. Define the error of the n— th iteration as the difference between the 
true solution x and the current guess, e" = x" — x. Substituting into 1121 . we have 

e" = [I - (SI + N)"'(S + N)]"e" = E"e°. (13) 

E is called the error matrix. We define the spectral radius of a matrix M as p(M) to be the maximum absolute eigenvalue of 
M. Since p(A + B) ^ p(A) + /5(B), the sufficient condition for 11311 to converge is 

5>p(S). (14) 

We can prove H14|l by expanding the matrix product in H13|l into two positive definite terms each with spectral radius less 
than unity, p[(5'I + N)~^N] < 1 and ^[(51 + N)~^S] < 1. The first inequality is trivial since the two terms are simultaneously 
diagonal. The second inequality follows by expanding its inverse, and noting that the smallest eigenvalue of the inverse is 
larger than one. 

When N is diagonal, we obtain optimal convergence with S = (5*0 + Soo)/^, i.e. the average of the largest and smallest 
eigenvalue of the noise matrix. For practical purposes, we have never encountered convergence problems with this choice, and 
always found better convergence than using the hard bound p4|l . 

In real space, l|13|l can be visualized as follows. We take an error vector e", multiply one copy by the pixel noise, and 
convolve another copy with the two point correlation function. We add the two resulting vectors, and divide by the pixel 
noise plus a constant corresponding to the amplification factor for the wave with the largest signal power. We then subtract 
this from the original vector e". The convergence is rapid if it pointwise converges to zero. The pixels which are slowest to 
converge are those where the noise is much smaller than S when we inject a wave corresponding to the smallest eigenvalue of 
the signal correlation. The worst pixel converges as 

le"Uoc[l-(iVo + So)/(iVo + S)]" (15) 

where A'o is the smallest eigenvalue of N, 5*0 is the smallest eigenvalue of S, and S is the largest eigenvalue. It takes 
~ (A'o + S)/{No + ^o) steps to decrease the error by an e-fold. The rate is related to the condition number of S, and the lower 
limit for N. It is apparent that choosing S larger than necessary slows the convergence down. 

The convergence rate is limited by the condition number of S + N. Consider the problem of power spectrum estimation 
on an irregular region with constant noise. We can express this problem on a larger enclosing domain, and write A'^^^^ = on 
the padding region. N^^ is not invertible in this case. But we can write 

(SN"^ + I)Na; = y. (16) 

Setting u = Na;, we first solve 1161 for u, and then evaluate x — N~^u, where zero entries on the diagonal of N^^ becomes 
a non-problem. The problem instead arises when the noise is small. Large and small is defined relative to the signal. Since 
the signal covers a spectrum of values, we shall use the mean value Sm as reference point. We factor N = HL, where H = N 
when N > Sm and equal to one otherwise. We can now generalize 1)16^ to 

(SH-^ + L)Ha; = y, (17) 
and define u = Ha;. The iterative solution becomes 

yn+i ^ + (_5H-i + L)-' [y - (SH"' + L)zi"] . (18) 

The convergence criterion 1)14^ is still a sufficient condition for the Jacobi iteration l|18|l . but may be quite non-optimal. We 
can instead use an iterative estimate of S — p(SH~^) using a power iteration. To prove the convergence of 1181 . we use the 
same argument as for 11411 . with the second inequality holding since HS~^H~^ is a similarity transformation on S which 
preserves eigenvalues. Should S have a large dynamic range, we can similarly factor it S = TU 

(TH~^ + U~^L)Ha: = U"^y, (19) 
with an iteration 

= + (f + L/^'L)-' [y ~ (TH"' + \J-''L)u"] . (20) 

A similar argument shows this to be a convergent iteration if we choose T = p{T) and U — l/p(U~^). 

The iterative approach also allows us to compute the ~ A'^(C^ + C®)~^A of the solution iteratively, as well as any 
expression involving the contraction of vectors, such as llOII . The likelihood itself requires computing a determinant, which 
does not fall into this framework. The trace evaluations needed in Equations 14I6II will be addressed below. 
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4 CONVOLUTION AND BOUNDARIES 
4.1 Multi-level FFT 

Consider a point set contained within an irregular domain Q with isolated boundary conditions. We imbed this domain fl in 
the smallest enclosing rectangle L. The multiplication by the signal matrix is just a convolution by the correlation function 
with isolated boundary conditions, which can be performed using an FFT on a domain twice as wide as L. The operation 
y — Sx sets the value at every point yi to the sum over all points x weighted by the correlatio n function at the separation . 
We can map any discrete set of points onto a regular grid using an interpolation of some order ijHocknev fc EastwoodllToSS^ . 

For this iterative technique, we only rely on each of the forward operations to be evaluated rapidly. The operation of the 
signal on the data can always be performed in C'(A'^ log A'') time on a spatial grid by considering it a convolution, which can 
be performed on a tree, even if the geometry is complex, the coordinate system is unevenly sampled, or the coordinates are 
otherwise not appropriate for a Fourier transform, for example on the celestial sphere. Again, several methods are possible, 
with tradeoffs in complexity of the algorithm against computational time. As discussed in lPadmanabhan et alJ l)2002ll . several 
levels of grids can be employed to accelerate the pr ocess. Open boundaries can be implemented, and one can add short range 
pair summations, much in analogy to N-body codes ([Hocknev fc EastwoodllQSSi) . If correlations on a whole sphere are desired, 
a spherical harmonic transform can be applied for convolutions on the coarse grid followed by FFT's on the fine grids. While 
spherical harmonic transforms cost 0{N^^^), the base is only the number of coarse grid cells, and one expects to be dominated 
by the C'(A'' log A'') of the fine grid in general. 



4.2 Tree 



Just as in N-body simulations, the optimal algorithm depends on the clustering of points. An alternative way of evaluating 
the signal correlation applied to a data vector SA is through tree summation. This approach is very general, and does not 
slow down under strong clustering. A stationary signal correlation is a convolution, and we can express it as 



U{X^) = SA = ^ A{Xj)G{\Xi - Xj\). 



(21) 



While Equation 1)21^ is an A'^'^ process, we note that it can be approximated to high accuracy through a multipole expansion. 
Points can be bunched together into nodes. Each node has the mean position of its constituent points (center of mass), a 
quadrupole moment, and the sum the masses. So instead of summing over all points, it is sufficient to sum over a smaller 
number of nodes. Let us describe the mass distribution of node k by pk{x). The convolution over the node is 



u{Xi) 



SA = ^y" dxpk{S)G{\x,^x\) 



E 



MkG(\x, - Xk\) + / Pk[x)r°'r''d^df,G[\x, - Xk\) + 0{G"') 



(22) 



The mass Mk = /pfc, and dipole term was cancelled by the choice {xk) — J xpk{x), and The second term in l|22^ can be 
rewritten in terms of the quadrupole moments Qap — j r^r^ pk with r" 



u(xi) = 



k 



MkG{\xi - Xk\) + 



lap 



al3 



,G'{\x, 



- Xk 



■ Xk 



+ f°'f''G"{\x,-Xk\) 



0{G"') 



(23) 



r = f/\r\ is the unit separation between the target point and the node center of mass. By comparing the quadrupole to the 
monopole, we obtain an estimate of the error in the truncation. Should this error be larger than our tolerance, we break the 
sum into subcomponents of the node. 

The tree needs only be constructed once, so we should invest sufficient computing resources into this step to minimize 
the truncation errors arising from the tree. One starts by defining a top node, which includes all points. Since our goal is to 
maximize locality of particles, we shall subdivide each node by cutting it in half by a line perpendicular to the major axis 
of the quadrupole. We want to place the cut such that the quadrupole of the subnodes is minimized. This cut line can be 
determined through bisection. One first cuts through the center of mass, and determines the quadrupole of each subnode. 
We then displace the cutline towards the node with the larger quadrupole. Let rm = vAmax be the square root of the largest 
eigenvalue of the parent node. We then displace the cutting line by an amount proportional to rm times 1 minus the ratio 
of quadrupoles of the subnodes. We repeat this process on each subnode until each node contains exactly one point. The 
resulting binary tree is not balanced, since the number of particles in each child node is not equal. The depth of the tree is 
not determined in advance. 

The truncation error in the quadrupole tree for a power law correlation function G{R) oc _R~" is 
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=n(n+l)(n + 2)(:^) . (24) 



G(R) ' '\R. 

Reducing rm by a factor of two reduces the error by a factor of 8. But the quadrupole is proportional to the node area, so the 
computational cost has increased a factor of 4. Let n2 be the number of desired binary digits of accuracy. Then the quadrupole 
truncated tree has a computational cost oc (2"^)^^^. If very high accuracy is desired, one should go to high multipole order. 
An order no multipole evaluation in two spatial dimensions costs 0{no) operations, so the total node cost cx Uq. At high 
multipole moments, it makes sense to reduce the node size since we gain rapidly. The optimal no is thus slightly smaller 
than 712. Asymptotically, the cost of the tree slightly less than n^. 

A simple tree decomposition to implement does not use the quadrupole information. For each node, we find the particle 
which is furthest from the center of mass. We store this distance with each node. We subdivide the node into two subnodes 
by a cut perpendicular to the line connecting the furthest particle to the center of mass. The cut is chosen such that the 
two subnodes have an equal number of particles. When walking the tree, we check that the maximal radius of the node is 
sufficiently smaller than the distance to the node. This tree is faster to build, since no quadrupole evaluations are required, 
and for each node the bisection only costs linear time in the number of particles in the node. It also minimizes the worst case 
error at each node. 



5 MULTISCALE ACCELERATION 

The convergence rate of equation 1181 is primarily limited by the condition number of S, since the relaxation parameter S 
is yields rapid convergence on modes with correlation power close to S. We can, for example, project the error e onto the 
eigenvectors of S. Since we chose S to be the largest eigenvalue of S, the corresponding eigenvector converges to zero error after 
one iteration. We thus consider breaking the iterative solution into blocks of the eigenspace of S, for which the corresponding 
eigenvalues have a small dynamic range. If we can use a different relaxation parameter for each block, the iteration proceeds 
rapidly. Let us write a complete set of ni, band power projection operators Pi which sum to the identity matrix. Multiplication 
by the projection can also be expressed as a convolution, for which the tree algorithm described above can also be used. We 
generalize the iteration from equation 1201 

=u" + ^(T.H-^ + U-'L)-'P, [y - {TU ' + U-^L)m"] . (25) 

i 

The error on the iteration is again 



I - ^(T,H-i + [7-iL)-ip,(TH-' + U-^L) 



(26) 



We set the relaxation parameters Ti = p(PiT) + Too and Ui = l/p(PiU ^) + Uoo- In the continuous Fourier decomposition 
limit, we have 

e"+^(x) = e"(x) - 1 d2fcrf2^'exp[ik.(x-x')]:^^^e"(x'). (27) 

The error matrix 12711 is zero on the diagonal. If there is no signal, P{k) — 0, the error matrix is zero everywhere, and the 
convergence is exact after one iteration. When the signal is dominant, we can neglect N{x), and the error matrix is again 
zero. Similarly, the error matrix is zero for two elements whenever N{x') = N{x). 

We can solve the error matrix exactly for two degrees of freedom. In this case, we write the noise matrix as 

The orthogonal Fourier transform matrix F is 

^-^(' ' 
We have two projection matrices 



(29) 



The signal correlation matrix is 

S.F-('^ » )f. ,31, 
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Figure 1. Convergence of Jacobi iteration and multiscale acceleration. Bottom horizontal label is the number of the iteration for the 
multiscale accelerated method, while the top axis is the iteration number for direct Jacobi iteration. The problem had 65536 variables. 



The error expression is then 

= {I - [((si + Soo)I + N)-^F-^Pi + {{S2 + soo)l + N)-^F-^P2](SF + FN)}" e° (32) 
For Soo = 0, the eigenvalues of this expression can be directly evaluated 

A = +i ^ {m-n,){s,-s,) _ ^33^ 

(ni + Si)(n2 + S2)(71i + S2)(n2 + Si) 

If we take rii and Si to be sorted positively in ascending order, we see the qualitative features of the convergence. Convergence 
is very rapid if either N or S is well conditioned. It is poor if m + si is very small compared to all the other differences in the 
problem, and can even fail to converge if |A| > 1. To treat the most extreme case, we set si = ni = 0. Then the iteration 1321 
will converge optimally for Sao ~ min(s2,n2), and the maximal eigenvalue is A < 1/2 for any choice of n^, S2. At least in this 
scenario, we have shown that even this potentially extremely ill-conditioned system converges rapidly, better than a factor of 
two on each iteration. 

In the continuum limit 1271 . we can estimate convergence in some limiting cases. If N{x') ^ Pik) is large the k integral 
turns the power spectrum into a correlation function, ,f(x' — x)/A^(x'), while if it is very small, the matrix element is given 
by the inverse correlation function Ai'(x)^~^(x — x'). In either case, the matrix is very small. 

When the signal matrix contains a large number of zero entries, setting Too to the mean value of the noise appears to be 
a robust choice. In general, we break the sum 12611 into logjdlSH) terms, each of which costs C'(Ai' log A'') operations. 

To test these concepts, we implemented a one dimensional example. 65536 grid points were used with periodic boundary 
conditions, and a power law signal correlation function P{k) = 256/fc, where k is the integer wave number on the grid. The 
noise variance in each grid point is a random number chosen uniformly from 0.35 to 1.35. We first run the straight iteration 
1121 1 for 500 iterations. The vector x was chosen randomly, from which y was computed. We define the scaled infinity error 
Coo = \x" — x\oo\/ 65536/ y^^- This error is plotted as boxes in figure In this situation, we have S ~ 128, 5*0 = 1/256 
and No — 0.35. From equation 1151 we expect to take up to about 366 iterations to reduce the error by an e-fold. This is 
consistent with the performance seen in figure Q 

Then we applied the multiscale accelerated iteration 12511 . We chose 16 bands, so the computational cost is about 16 
times higher, but the convergence is much better as we see in the crosses in figure Q The two sets of points have comparable 
computational cost, and the multiscale converges much better. We note that the convergence rate for the Jacobi iteration 
scales as the condition number, which is proportionate to the number of grid points, while the multiscale convergence is 
independent of that. The Jacobi iteration is still very fast compared to a brute force A^ solution of this problem. Each of the 
iterative methods here take a matter of minutes on a workstation, while a brute force method takes of order a year. 

It is instructive to compare this multiscale approach to traditional multigrid aceleration methods. We solve an arbitrary 
linear equation 
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Lu^b (34) 

for some linear operator L. The simplest procedure applies a Jacobi iteration 

= (1 - D-'L) ?i" + D-'b, (35) 

where D is the diagonal part of L. We then have a restriction operator TZ which maps the vector u onto a coarser vector 
ui — TZu. In this coarser space, we need to have a coarsened version of the operator, denoted L, for which we now solve the 
coarsened system 

L<+''" = A6^+' = n{h - l.u"+^'^). (36) 

One recursively applies multigrid to this coarsened system. For purposes of analysis, we assume that the coarse grid equation 
H3()|l was solved exactly. We inject this coarse grid correction using the coarse-to-fine grid prolongation operator V 

= [PL-'7^(l-D~'L)+D-'] (fe-Lii"). (37) 

The error can be written as 

e" = [(1 - PL-^7^L)(1 - D-^L)] " e«. (38) 

The standard Jacobi iteration only has the right term in parenthesis. For the Laplace operator L — = —k^, we can see in 
Fourier space that this term is dominated by small k, i.e. large scales, for which the term is unity, and convergence is slow. 
But on large scales, the restriction and prolongation operators basically commute with everything, and the first term is very 
small. So we see that convergence is rapid on all scales for the Laplace operator. 

In correlation problems, the linear operator is typically an inverse power of the wave number, L ~ for which the 

large scales actually converse rapidly, but the small scales don't. On these small scales, the prolongation and restriction 
operators introduce large errors, and the convergence is not aided by this multigrid decomposition. One can remedy this 
problem by multiplying both sides of the equation by a sufficiently large power of k. Or one can recast the equation IIH 
as (S~^ +N^^)Nx = S~^j/, and solve the system using only inverse matrices. But as we had seen in equations 11513311 . the 
convergence is generally limited by a bad condition number on the noise matrix N. Let us consider the above 2x2 example, 
choosing the case when ni = si = 0. For the coarse grid, the only conceivable operator is the scalar L = (s2 + n2)/2. The 
restriction to a one cell coarse grid is the average of the two cells, and vice versa. The iterator error 138|l is then 

1 52_ \ f Q -1 



°2 ; 1 £2_ ' ^ ■ ^^^^ 

The term in the right parentheses is the error for a standard Jacobi iteration without multigrid, and the left term is the 
multigrid acceleration. In a pure Jacobi iteration, the eigenvalues are A = ^yT^^^ ' which tends to 1 as n2 tends to zero 
and is thus slowly convergent in some cases. The combined error matrix has eigenvalues A = ^2 (n2 ±i V3"2 +2s2 V5n2 +232 ) ^j^j^jj 

J a a 2(n2+S2)(2n2 + S2) ' 

are still always less than one, but approach 1 as n2 approaches zero. The full multigrid scheme thus does not solve the 
illconditioning problem of the noise matrix. As we showed above, the generalized multiscale decomposition proposed here 1321 1 
is always well conditioned and converges by a factor of 2 every iteration for any choice of signal or noise. The computational 
cost of the multiscale method is logarithmically more expensive. 

In practice, maps often has ragged edges, defect, and other excisions, which generically give rise to large entries in the 
noise matrix, and make it ill-conditioned. For polarization maps, large regions of the signal matrix may also be very small 
(e.g. the B-mode discussed below). Multigrid methods must bin pixels with widely varying noise, which generically gives rise 
to convergence problems. The prologation of the inverse of the coarse grid operator does not necessarily cancel the fine 
grid operator L on coarse scales. We conclude that the multiscale procedure proposed here is more general than multigrid. 



6 TRACE EVALUATION 

Evaluation of Equation Q require s not only solving linear equations, but also the evaluation of a trace. We propose a rapid 
evaluation similar to that used in lOh et alJ lll999l) . Consider a vector Vi with random elements which are either +1 or -1. 
Then 

(v.vj) = S,j. (40) 
Thus, 

{w^(C^ + C'^y'C^v) = Tr[(C^ + C^)-^C,]. (41) 
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The vector equation on the left only requires solving linear systems, which we can evaluate rapidly. 

If we now construct a series of orthogonal vectors v, then an average over A'^ vectors makes 14111 exact. We construct an 
orthogonal sequence by setting the first vector to be +1 for the first half elements, and -1 for the second half, the second 
vector +1 for the first and third quartile, and -1 on the remainder, etc. We then apply a random shuffle permutation on each 
vector, which results in a random orthogonal sequence. Each of these operations is 0{N). 

For a diagonal matrix, a single iteration is always exact. For a positive definite matrix, the convergence rate is approxi- 
mately 

ir ^/n Ir 

where A'^ is the linear size of the matrix, and n is the number of iterations. If the eigenvalues are of comparable magnitude, the 
RHS of 1131 1 is oc l/V Nn. In the worst case, all eigenvalues except for one is zero, in which case the convergence is oc Xj^fn. 

When we replace the trace by a stochastic estimate as done in equation 1411 1. the corresponding Jacobian © in the 
maximum likelihood solution is no longer symmetric. The strategy to test for the sign of the eigenvalues may fail in this case, 
since eigenvalues are no longer guaranteed to be real. Empirically, we find that checking the eigenvalues of the symmetrized 
Jacobian works as a good indicator of convergence problems. When the sign of the eigenvalues is incorrect, we use the 
symmetrized eigensystem to correct the iteration direction. This symmetrized jacobian only gives first order convergence, 
instead of the second order expected from the exact Newton- Raphson. So we simply switch to the standard iteration in the 
vicinity of the solution, and find rapid convergence to machine precision. 



7 IMPLEMENTATION 

We will discuss the implementation for weak lensing and galaxy angular power spectrum analysis. Statistical weak lensing 
allows a measurement of the gravitational field of the dark matter by measuring the induced alignments of background galaxy 
shapes. Most recent analyses have relied on the correlation function, which is an inverse noise weighted quadratic estimator. 
Computing the two point correlation function is in general not an optimal estimator of the power. Being a quadratic estimator, 
it weights each data bin by only its local noise. Each correlation function bin qi is an inverse noise weighted estimator 
of the data A for a bilinear form Qc 

= A^iV^'O.Ar^'A (43) 

while an optimal estimator llUl should have used {S-\-N)~^ as its weights instead. In the limit that signal to noise is small, the 
correlation function is optimal, similarly it is optimal if the signal and noise covariance matrices commute. For weak lensing 
surveys, on small scales signal to noise is small, whil e on larger scales the coverage is reasonably uniform, so a correlation 
function is not a very poor estimator. As described in lPen et alJ i2003t) . the correlation function computation is 0(A''log A''), 
so is very fast. 

Alternatively, one can grid the data and perform optimal analysis as described in IHu fc Whit3 J200ll) . Due to the large 
cost 0{N'^), large grid cells must be chosen, leading to non-optimality in the binning. The algorithm in this paper combines 
the advantages of both approaches. 

We can consider each galaxy to have two observables, gi, g2, as well as a position. Our goal is to measure the correlations 
described as two power spectra. The noise on each data points tends to be very large, usually much larger than the signal. 
In this regime, the straight Jacobi iteration H12^ has good convergence according to equation l|15|l . and we may be able to do 
without multiscale acceleration. 

In general, the observed galaxy ellipticity is related to the reduced shear, (gi) = 7i/(l — k), which depends on both the 
shear 7 and the convergence k. In weak lensing, k ~ 0, and we equate the expectation value of the galaxy alignment with the 
shear. The covariance is 

{gi{xa)gj{xf))) = Cij{xa - X + S^pS^jg"^ /2. (44) 

The last term should be the intrinsic ellipticity of the source plane galaxy. But that is not directly observable. The observed 
ellipticity is the sum of the actual ellipticity and the lensing induced ellipticity. In weak lensing, we will ignore the induced 
ellipticity. This corresponds to forcing Cij(O) = 0, i.e. forcing the power spectrum to have zero mean. 

A generic spin two correlation function Cij can be described in terms of two power spectra. The first one, called 'electric', 
or 'div-like' describes the divergence of the polarization field, Cf", and is induced by weak lensing. The second one, called 
'magnetic' or 'curl-like' describes the curl of the polarization field, . From these, we construct two correlation functions. 
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c+{e) = I idic'^(i).h(ie) + c'^{i),h{w) 
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Virmos F14 galaxy positions 




Figure 2. Position of Iab > 22 Virmos galaxies in the 14h field. The power spectrum of dark matter in the form of weak lensing shear 
distortions is sampled at galaxy positions. This survey geometry is non-trivial at all angular scales. 



-W = ^ I ldlC'^{l)Jo{ie)-C'^{l)Ji{W) (45) 



We can express 

c+ + c_ cos(4(^) c_ sin(4(^) \ 
c- sin(4<^) c+ — c_ cos(4<j!>) J 

in terms of the angle tan~^ <f) = Ay/ Ax between the two galaxy positions. 

To test the procedure, we used a real weak lensing survey geometry as described bv lPen et alJ (|200,'^ 1 . The galaxy positions 
are shown in figure|21 which corresponds to the survey geometry. It is apparent that the geometry is highly irregular, where the 
survey geometry contributes significantly more power on all scales than the intrinsic cosmic signal. To generate the simulated 
power spectrum, we randomly rotated the actual galaxies from the survey, and added the shear field from the simulation. 
The random rotation erases all two point information. We generated a spin-2 Gaussian random fields consisting of a pure E 
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Figure 3. Angular power spectrum recovered from a mock lensing field. Tlie solid line shows tlie input power spectrum, tlie daslied line 
is the quadratic estimator of the noise. The boxes with error bars indicate the recovered power from the noisy catalogs, with the crosses 
indicating the recovered B mode. The simulations contained no B mode. 

mode, and sampled that field at the galaxy positions. The mock catalog consists of the original galaxy rotated by a random 
angle, and with its ellipticity divided by 2 to present a more significant challenge to the condition number. To this we added 
the Gaussian signal. Our multiscale procedure converged to machine precision after 8 iterations in double precision. 

Figure 13 shows the angular power spectrum recovered from a simulated galaxy survey. We see that the power spectrum 
of the noise crosses from small at large scales to dominant for I > 1000, so a pure noise weighted correlation analysis is 
non-optimal at large angular scales. 

8 CONCLUSION 

We have presented the framework to rapidly and optimally compute power spectra. Its computational cost is 0{N log N), and 
involves a stochastic evaluation of traces. The latter converges to the exact result in iterations, yielding the exact solution in 
0{N^ log Af) operations. In practice, many fewer iterations provide the answer to sufficient accuracy. The stochastic evaluation 
is also trivially run on a cluster of cheap computers, which are often readily available. 

We have presented mathematical estimates of convergence, which is good in most practical applications. For the estimation 
of weak lensing power, only a few iterations are required to converge to machine precision. 

For the upcoming lensing and CMB surveys such as the Canada-France-Hawaii- Telescope-Legacy-Survey, this rapid 
algorithm will allow an optimal analysis of the data in a tractable amount of computational effort. 

I would like to thank Yannick Mellier and Ludovic van Waerbeke for providing the Virmos galaxy data sets. 
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